Kleene's Algorithm
   HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, in particular in
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
, Kleene's algorithm transforms a given
nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...
(NFA) into a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
. Together with other conversion algorithms, it establishes the equivalence of several description formats for
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s. Alternative presentations of the same method include the "elimination method" attributed to Brzozowski and
McCluskey McCluskey or McCloskey (Irish: ''Mac Bhloscaidh'') is an Irish surname. It evolved as a branch of the Ó Catháin clan in County Londonderry. History The McCluskey sept are a branch of the O'Cahans, the former Lords of Keenaght. The McCluskey fam ...
, the algorithm of McNaughton and
Yamada Yamada (山田, ) is the 12th most common Japanese surname. Notable people with the surname include: *, Japanese model, actress and idol *, Japanese field hockey player *, Japanese illustrator and manga artist *, Japanese rugby union player *, Ja ...
, and the use of
Arden's lemma In theoretical computer science, Arden's rule, also known as Arden's lemma, is a mathematical statement about a certain form of language equations. Background A (formal) language is simply a set of strings. Such sets can be specified by means of ...
.


Algorithm description

According to Gross and Yellen (2004), Here: sect.2.1, remark R13 on p.65 the algorithm can be traced back to
Kleene Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch of ...
(1956). A presentation of the algorithm in the case of
deterministic finite automata In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
(DFAs) is given in Hopcroft and Ullman (1979). The presentation of the algorithm for NFAs below follows Gross and Yellen (2004). Given a
nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...
''M'' = (''Q'', Σ, δ, ''q''0, ''F''), with ''Q'' = its set of states, the algorithm computes :the sets ''R'' of all strings that take ''M'' from state ''q''''i'' to ''q''''j'' without going through any state numbered higher than ''k''. Here, "going through a state" means entering ''and'' leaving it, so both ''i'' and ''j'' may be higher than ''k'', but no intermediate state may. Each set ''R'' is represented by a regular expression; the algorithm computes them step by step for ''k'' = -1, 0, ..., ''n''. Since there is no state numbered higher than ''n'', the regular expression ''R'' represents the set of all strings that take ''M'' from its start state ''q''0 to ''q''''j''. If ''F'' = is the set of accept states, the
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
''R'' , ... , ''R'' represents the language
accepted ''Accepted'' is a 2006 American comedy film directed by Steve Pink (in his directorial debut) and written by Adam Cooper, Bill Collage and Mark Perez. The plot follows a group of high school graduates who create their own fake college after bei ...
by ''M''. The initial regular expressions, for ''k'' = -1, are computed as follows for ''i''≠''j'': :''R'' = ''a''1 , ... , ''a''''m''       where ''q''''j'' ∈ δ(''q''''i'',''a''1), ..., ''q''''j'' ∈ δ(''q''''i'',''a''''m'') and as follows for ''i''=''j'': :''R'' = ''a''1 , ... , ''a''''m'' , ε       where ''q''''i'' ∈ δ(''q''''i'',''a''1), ..., ''q''''i'' ∈ δ(''q''''i'',''a''''m'') In other words, ''R'' mentions all letters that label a transition from ''i'' to ''j'', and we also include ε in the case where ''i''=''j''. After that, in each step the expressions ''R'' are computed from the previous ones by :''R'' = ''R'' (''R'')* ''R'' , ''R'' Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to ''n'' are successively removed: when state ''k'' is removed, the regular expression ''R'', which describes the words that label a path from state ''i''>''k'' to state ''j''>''k'', is rewritten into ''R'' so as to take into account the possibility of going via the "eliminated" state ''k''. By induction on ''k'', it can be shown that the length of each expression ''R'' is at most (4''k''+1(6''s''+7) - 4) symbols, where ''s'' denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by ''M'' is at most (4''n''+1(6''s''+7)''f'' - ''f'' - 3) symbols, where ''f'' denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size.. Theorem 16. In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to ''n''.


Example

The automaton shown in the picture can be described as ''M'' = (''Q'', Σ, δ, ''q''0, ''F'') with * the set of states ''Q'' = , * the input alphabet Σ = , * the transition function δ with δ(''q''0,''a'')=''q''0,   δ(''q''0,''b'')=''q''1,   δ(''q''1,''a'')=''q''2,   δ(''q''1,''b'')=''q''1,   δ(''q''2,''a'')=''q''1, and δ(''q''2,''b'')=''q''1, * the start state ''q''0, and * set of accept states ''F'' = . Kleene's algorithm computes the initial regular expressions as : After that, the ''R'' are computed from the ''R'' step by step for ''k'' = 0, 1, 2.
Kleene algebra In mathematics, a Kleene algebra ( ; named after Stephen Cole Kleene) is an idempotent (and thus partially ordered) semiring endowed with a closure operator. It generalizes the operations known from regular expressions. Definition Various ine ...
equalities are used to simplify the regular expressions as much as possible. ; Step 0 : ; Step 1 : ; Step 2 : Since ''q''0 is the start state and ''q''1 is the only accept state, the regular expression ''R'' denotes the set of all strings accepted by the automaton.


See also

*
Floyd–Warshall algorithm In computer science, the Floyd–Warshall algorithm (also known as Floyd's algorithm, the Roy–Warshall algorithm, the Roy–Floyd algorithm, or the WFI algorithm) is an algorithm for finding shortest paths in a directed weighted graph with p ...
— an algorithm on weighted graphs that can be implemented by Kleene's algorithm using a particular
Kleene algebra In mathematics, a Kleene algebra ( ; named after Stephen Cole Kleene) is an idempotent (and thus partially ordered) semiring endowed with a closure operator. It generalizes the operations known from regular expressions. Definition Various ine ...
* Star height problem — what is the minimum stars' nesting depth of all regular expressions corresponding to a given DFA? *
Generalized star height problem The generalized star-height problem in formal language theory is the open question whether all regular languages can be expressed using generalized regular expressions with a limited nesting depth of Kleene stars. Here, generalized regular expres ...
— if a complement operator is allowed additionally in regular expressions, can the stars' nesting depth of Kleene's algorithm's output be limited to a fixed bound? *
Thompson's construction algorithm In computer science, Thompson's construction algorithm, also called the McNaughton–Yamada–Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton (NFA). This NFA can be used to ...
— transforms a regular expression to a finite automaton


References

{{reflist Algorithms Finite automata Regular expressions